Adaptive Audio-visual Speech Recognition in the Presence of Audio and Video Distortions
نویسندگان
چکیده
Audio-visual speech recognition leads to significant improvements compared to pure audio recognition especially when the audio signal is corrupted by noise. In this article we investigate the consequences of additional degradations in the video signal on the audio-visual recognition process.. We degrade the images with noise, a JPEG compression, and errors in the localization of the mouth region. The first question we address is how the noise in the video stream influences the recognition scores. Therefore we added noise to both, the audio and video signal at different SNR levels. The second question is how the adaptation of the fusion parameter, controlling the contribution of the audio and video stream to the recognition, is affected by the additional noise in the video stream. We compare the results we obtain when we adapt the fusion parameter to the noise in the audio and video stream to those we get when it is only adapted to the noise in the audio stream and hence a clean video stream is assumed. For the second type of tests we use an automatic adaptation of the fusion parameter based on the entropy of the a-posteriori probabilities from the audio stream.
منابع مشابه
Improved Speech Recognition using Adaptive Audio-visual Fusion via a Stochastic Secondary Classifier
The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classijier on the word likelihood scores from both the audio and video modalities is investigated fo r the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion acros...
متن کاملP1: Negative Television and Memory
According to reports about 30-thousand people spent watching television had the impact on their memory and recall that the results showed no differences between men and women. The people who watched less than an hour a day did better at every memory function. As these contributors watched negative political ads, physiological responses indicated that their body was reflexively preparing to move...
متن کاملOpen-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
Automatic speech recognition (ASR) on video data naturally has access to two modalities: audio and video. In previous work, audio-visual ASR, which leverages visual features to help ASR, has been explored on restricted domains of videos. This paper aims to extend this idea to open-domain videos, for example videos uploaded to YouTube. We achieve this by adopting a unified deep learning approach...
متن کاملAdaptive Audio-Visual Speech Recognition with Distorted Audio and Video Data
Martin Heckmann , Frédéric Berthommier , Christophe Savariaux , Kristian Kroschel 3 1 Honda Research Institute Europe, 63073 Offenbach, Germany, Email [email protected] 2 Institut de la Communication Parlée (ICP), 38031 Grenoble, France, Email: {bertho, savario}@icp.inpg.fr 3 Institut für Nachrichtentechnik, Universität Karlsruhe, 76128 Karlsruhe, Germany, Email: [email protected]...
متن کاملCharacteristics of the Use of Coupled Hidden Markov Models for Audio-Visual Polish Speech Recognition
This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). Described methods where developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audio-visual sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003